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ABSTRACT 


As unmanned aerial vehicle (UAV) technology and 


availability improves, it becomes increasingly more 





important to operate UAVs efficiently. Utilizing one UAV at 


a time is a relatively simple task, but when multiple UAVs 





need to be coordinated, optimal search plans can _ be 





difficult to create in a timely manner. In this thesis, we 





create a decision aid that generates fficient routes for 





multiple UAVs using dynamic programming and a limited-look- 


ahead heuristic. The goal is to give the user the best 





knowledge of the locations of an arbitrary number of targets 
operating on a specified graph of nodes and arcs. The 


decision aid incorporates information about detections and 





nondetections and determines the probabilities of target 


locations using Bayesian updating. Target movement is 





modeled by a Markov process. The decision aid has been 


tested in two multi-hour field experiments involving actual 





UAVs and moving targets on the ground. 
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EXECUTIVE SUMMARY 


Since its conception, the unmanned aerial vehicle (UAV) 
has been a coveted battlefield asset. The ability of these 
vehicles to perform reconnaissance and attack missions while 


keeping the operators directly out of harm’s way creates an 





advantage in the domains of information gathering and force 


protection. UAVs have only recently been introduced on the 





battlefield in significant numbers, and the ability to 





operate multiple UAVs efficiently and effectively can be 


improved further. 


This thesis creates a decision aid that provides 
efficient search routes for multiple UAVs searching for 
multiple targets operating on a known graph of nodes and 


arcs. The decision aid dynamically provides estimates of 





target locations during its use. 


The decision aid consists of a dynamic program that is 
solved approximately using a two-timestep look-ahead 
heuristic. Target location probabilities are computed using 
Bayesian updating based on the detections and nondetections 
from the previous timestep. The decision aid includes the 


possibility for UAVs to go on and offline due to mechanical 





difficulties or limited endurance. 





The decision aid was tested in two field experiments at 
Camp Roberts, California, as part of the USSOCOM-NPS Field 


Experimentation Program. The field experiments included up 








to three UAVs and five target vehicles. For the second 


experiment, a prototype of the decision aid running through 








a Microsoft Excel user-interfac was used. “EN: interfac 





proved to be highly effective in communicating to the user 
xiii 


the current knowledge of target locations and provided 





timely recommendations for the UAV operators. 
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I. INTRODUCTION 


A. MOTIVATION AND PROBLEM DEFINITION 


Search for moving targets arises in many different 
contexts. For example, searching is necessary when the goal 
is to find drug smugglers or shot-down pilots during search 


and rescue missions. The sensors used for thes searches 





are often mounted on unmanned aerial vehicles (UAVs), thus 
UAVs become search assets. When multiple UAVs interact 
during a search, there becomes a need to effectively operate 
and manage them within the search environment. 

We consider a finite number of searchers and targets 


that move on a graph of nodes and arcs. We assume the 





searchers have a close estimate of the number of targets. 


The targets remain within the graph and move according to a 








known Markov process. The overall goal is to route the 


searchers during a finite time horizon so that the search 





coordinator gains the maximum situational awareness of all 





targets, as quantified by probability distributions of 











target locations. There are many possible objective 


functions for problems of this kind. We specifically aim to 








maximize th xpected number of detected targets until the 








finite time horizon while ignoring targets that are known to 
be located at a given location with a probability larger 


than a specified threshold. Target thresholds are discussed 














in detail in section A of Chapter : We refer to this 








problem as the search optimization problem (SOP). In this 


thesis, we develop a model for SOP and a heuristic algorithm 








for obtaining efficient search plans in real-time within a 
rolling time horizon framework. 


1 


The graph in SOP could represent a road network where 





nodes are intersections and arcs are roads. Alternatively, 
the graph could represent a grid of area cells on the open 


ocean. Figure 1 shows an example of nodes and arcs in a 





road network at Camp Roberts, California. 


Figure l. Example of Graph. 





Currently, no tractable model of SOP exists that 


incorporates all major aspects of real-world operations. 





SOP is difficult to solve optimally because the optimal move 
for the searchers at a timestep is dependent on the future 
searcher locations and actions as well as target location 


probabilities. We refer to such locations, actions, and 
2 


probabilities at a particular point in time as the “state” 


at that time. This dependence on future states requires th 








use of dynamic programming. This situation tends to result 


in intractable model formulations of SOP that cannot be 





solved quickly enough for use in a real-time decision aid. 





Dynamic programming is discussed in subsection B2. 








In this thesis, we develop a new version of a decision 


aid called Aerial Search Optimization Model (ASOM), see, 





Cg | LES Is It consists of a tractable model for SOP, an 
associated heuristic algorithm for generating search 


policies, and a user interface. ASOM is specifically 





tailored for use by UAV operators, provides ffective UAV 


routes quickly, and is relevant to many different search 


applications. 
B. FUNDAMENTAL CONCEPTS 
1. Bayesian Updating 


Bayesian updating in the context of search is a process 
that begins with prior knowledge of target location 
probabilities, commonly referred to as the a priori map. 
This map is based on previous information, if such info 
exists, or it is assumed to be uniform, absent prior 
information. Figure 2 gives an example of a 4 cell a priori 


map where a single searcher is searching for a single 





stationary target known to be present in the map. In this 


thesis, we account for false negatives, but we assume that a 





searcher will not report a target on a node or an arc if 
there is no target at that node or arc (i.e., no false 


positives). We refer to Chung and Burdick [3] for a 








discussion of false positive reports. If the searcher looks 





a 





in the top left cell and fails to find the target, then 
Figure 3 shows the resulting posterior map given the 


searcher has a .5 conditional probability of detection. The 





posterior map is computed by the following equation: 


P(D'|A)PCA) 





P(A,| D')= 
>) P(D'1 A, )P(A,) 
j 
where 
oie index of target cells 
P(A) probability target is located in area i 








P(D'\A,) probability of no detection in cell i given 





target is in cell i 





P(A,|D') probability target is located in cell i 


given no detection is made in that cell 


For each cell, the updated probabilities are computed 





by multiplying the probability of no detection given there 





is a target in the cell by the prior probability there is a 














target in the cell. This number must then be divided by the 
sum of these numbers for all cells in order to normalize the 
probabilities. See Wagner, Mylander, and Sanders [20] for a 





more detailed mathematical explanation of Bayesian Updating. 











Figure 2. A Priori Target Distribution. 











Figure 3. Posterior Target Distribution. 





The above discussion deals with “false negatives,” 








which occur when a searcher fails to detect a target that is 


actually there. 


2. Dynamic Programming 





Dynamic programming is a framework for modeling 


decisions made over time [14]. The state of a dynamic 





program is a snapshot of the system being modeled at a 
specific time. Given a finite time horizon, the backward 
recursion algorithm generates optimal decisions at every 
timestep starting from the end and working backwards 
assuming there are a finite amount of states. However, this 


involves examining all states at each time step and 





determining the best decision at that state. 


The backward recursion algorithm breaks down if there 





are an infinite number of states and/or the determination of 


the best decision at a state is a difficult optimization 








problem. In addition, it may be problematic to use this 
algorithm if the time horizon is not known. 

Approximate dynamic programming algorithms seek to 
overcome the shortcomings of the backward recursion 
algorithm by introducing approximations. There exist a 
large number of approximate dynamic programming algorithms, 


see, e.g., [14]. Typically, these algorithms step forward 
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in time. The main difficulty is to determine the “value” of 








transitioning to a specific state. One technique is to use 
a limited look-ahead. This is a process of enumerating all 
possible moves for all timesteps of the designated look- 





ahead period and making the moves that achieve the greatest 
reward in terms of the objective function. Longer look- 
ahead periods will better approximate the optimal dynamic 
programming solution. We will use an approximate dynamic 
programming algorithm because it provides an effective 


solution that can be provided in real-time, a key 





requirement for our implementation. 


Cc. PAST WORKS 


The goal of the constrained-path, moving-target search 





problem [5, 6, 7, 13, 18, 19, 21] is to find the search 





route that maximizes the probability of target detection 
within a fixed time. The classic setup involves a single 


searcher and a single target moving within a finite number 








of cells in discrete time. Both the searcher and the target 
are allowed to occupy a single cell each timestep, and 


detections may only occur when the searcher and target 





occupy the same cell. Detection probabilities can be based 
on sensor data or derived from the random search formula 


P2aiia The target’s probability distribution is maintained 





through Bayesian updates for nondetection each timestep if 





the target is not found. 
For the classic constrained-path, moving-target search 


problem, Eagle and Yee [6] select a searcher route over a 





given number of time periods that minimizes the probability 





of nondetection. Their formulation is a non-linear program 


with linear constraints, which allows one to apply 
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Zangwill’s [24] Convex Simplex Method (CSM). Eagle and Yee 





[6] create a myopic search, and while results of their 


example show the CSM solution to always be optimal, the 





myopic search may not provide a good approximation of the 


optimal solution. 








A partially observable Markov decision process [2] is 


another concept that has been applied to the constrained- 





path, moving-target search problem. The idea is that a 


decision must be made based on partial information, and the 





outcome of the decision is unknown until after it has been 





made. The search application is well-suited for this setup 


becaus th searcher will have incomplete knowledge of 











target location after each timestep based on the updated 














target probability distribution. The searcher will not know 





whether or not the search will be successful until after the 
new search route is chosen. 


Eagle [5] provides an optimal solution technique using 





dynamic programming and assuming a finite time horizon. He 


uses a partially observable Markov decision process, which 





is faster than standard linear programming methods because 





total enumeration is limited to searching only the cells one 
can reach from the searcher’s previous location. Stewart 
[18, 19] creates an approximate solution procedure using 


branch-and-bound techniques. Eagle and Yee [7] extend 





Stewart’s work and create a branch-and-bound method that 


produces optimal solutions and is faster than the dynamic 





programming approach. Washburn [21] creates a branch-and- 
bound approach as well. Both Eagle and Yee [7] and 
Washburn [21] consider searchers that have continuous search 


routes. Other than Washburn [21], who accounted for 


multiple searchers, 


against a sing] 





Dell, Eagle, 





to include multiple 


le target and provide optimal 


Martins, 


these problems consider one searcher 


solutions. 





4] 


They create a branch-and- 


and Santos extend the problem 


searchers. 


bound procedure to optimally solve the problem as well as 








six heuristics that take four different approaches to the 
problem: solve partial problems optimally, maximize the 
expected number of detections, implement a genetic 
algorithm, and use local searches with random restarts. The 


partial problem technique 


each one is solved opt 
Members of the a 


have analyzed the mult 


involves a moving horizon where 





timally using branch-and-bound. 


utonomous systems and control community 





tiple UAV search problem as well. Some 


while others 





utilize recursive Bayesian filtering 10] 


[8, 


[1, 


focus on control 11] and decentralized 


igo 
problem of multiple searchers looking for multiple targets 
8, 9, 10, 23], 
oe Sy 
[9] 


cooperative 


search techniques. Many of them have considered the 


[1, Pl which is an extension to the works 


mentioned above Ty - 3y:. 26; £9, 21). Fernandez, 


[3] 





Flint, and Polycarpou as well as Chung and Burdick 


create a Bayesian method that helps take into account false 





positives. 
Another consideration is using discrete time to more 


closely model continuous time. This situation occurs when 





the travel time for targets and searchers between cells is 


not a multiple of the discret and 


[13] 





timestep. Lau, Huang, 





Dissanayake enhance the branch-and-bound method to take 


into consideration the non-uniformity of the search 


environments. They develop a new bound that leads to faster 





solution times as well as the possibility of better 











solutions when the environment being modeled is spatial- 
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temporal non-uniform in nature. Sato and Royset [17] 


produce alternative bounds and even faster solutions. 





In the near future, sufficient technology will exist to 


allow the automatic detection of targets by computer 





systems. When these automatic detections can be 
incorporated within a search program, it will allow the 
autonomous routing of UAVs. With current technology, human 
operators are required to visually identify targets. The 








issue of target detection can be handled with a decision aid 


that has an input for the detections made each timestep. 








While many solutions have been presented for the 





constrained-path moving-target search problem and_= some 


research tools have been developed for specific scenarios 





(see, e.g., [15, 16]), a decision aid that can be used in 





real-world scenarios has yet to be fully developed. The 





goal of our research is to provide a user-friendly decision 
aid that is capable of creating efficient UAV routes for 
detecting multiple targets operating on a known graph. This 


decision aid will be capable of providing real-time 





ffective decisions with computation times on the order of 


seconds. 


D. STRUCTURE OF THESIS AND CHAPTER OUTLINE 


This thesis is divided into five chapters, including 

















the Introduction. Chapter discusses the development of 











the model and the dynamic programming formulation. Chapter 














introduces the actual algorithm used to implement our 














model. Next, it analyzes the accuracy and runtime of our 
heuristic approach. Finally, it discusses the Excel user 
interface created for our decision aid. Chapter IV talks 








about our field experiments in Camp Roberts, California and 





9 


explains some of the updates our decision aid underwent in 
the process. Chapter V gives several conclusions from our 
work as well as recommendations for future work involving 


ASOM. 
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II. MODEL 


A. MODEL DEVELOPMENT 


We formulate a model of SOP using dynamic programming 





with Bayesian updating. We assume that each target moves 





according to a Markov process and that the targets move 


independently of one another. The presentation below and 





our implementation of the model assume that all the Markov 





processes for the various targets have the same transition 


matrices. However, it is trivial to extend this to the 





general case where targets follow different movement 
processes. Targets are differentiated by their velocity and 
type characteristics (e€.g., person versus vehicle). 


The searchers are differentiated by a variety of 





characteristics including name, velocity, sweepwidth of 





their sensors, and whether or not they have a camera with a 





moving eye which enables them to search nearby roads while 
flying straight routes between nodes. 


All dynamic programming models must have discrete 








timesteps. In our model, timesteps are used as a discrete 


representation of continuous time. One timestep is the 





length of time between each discretized value of time with 
smaller timesteps being a better approximation of continuous 
time. 

Our dynamic programming model contains several states 


that change according to some process as the model advances 





through time by the use of timesteps. The state of the 








searcher includes the arc the searcher is currently on, the 


amount of time until the searcher reaches the head node of 





that arc, and the type of move that is currently being 


11 





executed. There are thr possible types of moves: “Road 
Search,” “Transit,” and “Search at Location.” “Road Search” 


means that the searcher examines the road corresponding to 














the current arc while traversing it. It is possible to 





detect targets on that road, and any time remaining of the 





timestep after reaching the head node of the arc is spent 





searching that head node. “Transit” means that the searcher 


flies a direct route from the tail node to the head node. 








It is not possible to detect a target when completing this 
type of move, but rather offers the possibility to reach the 


head node faster and allows more time for search at that 





node. “Search at location” means that the searcher spends 








th ntire timestep searching its current location. 





The other main states in the dynamic programming model 
are the target probability maps. There is one probability 
map for each target and the entire map is a matrix where the 
entry in row i and column j represents the probability that 
the target is on are (i,j), if i = j, this represents the 
probability at a = node. These probability maps are 


dynamically updated as the model transitions from _ one 








timestep to another. The updates due to detections and 





nondetections using Bayesian updating are first carried out. 





Then, the updates due to movement of targets by the Markov 
process are computed. 
More specifically, when detections are made, the 


location and type of detection are inputted into the model. 








The model updates the target probability maps for the 


detections based on the probabilities of seeing different 











targets at the input detection locations. It looks at all 





the different “detection scenarios” and determines the 
probability of each happening and decides which scenario 
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occurred based on a random draw with the associated 
probabilities. Here, a “detection scenario” is an element 
of the set of all the different permutations of possible 


target detections at each detection location. For example, 








if there are two detections at time t and three available 





targets, the model creates all possible permutations of 

















target detection scenarios. In this situation, there are 
Six possible scenarios, three choices (possible targets) for 


the first detection and then two remaining choices (one of 





the two not found in the first detection) for the second 





detection. The model then computes the probability of each 


of the six different scenarios occurring based on the target 





marginal probabilities and decides which one actually 
occurred using a random draw with the corresponding 
probabilities. 

We also use the concept of search thresholds. This 
threshold is a user input between 0 and 1 used to determine 
what level of target knowledge will constitute “knowing” 


where a target is located. This is an attempt to gain 








better total situational awareness by ignoring targets that 








we “know” are at certain locations. A threshold value of 1 





creates a greedy policy where searchers will circle targets 








unless a higher probability mass presents itself at a nearby 





location. On the contrary, if the threshold value is less 





than 1, then targets whose maximum probability mass is above 





that threshold will not be searched for, resulting in a less 








greedy policy. 


We also calculate an aggregate probability map to 








represent the normalized probability of all targets that are 








unknown (i.e., do not reach the threshold) by summing the 
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probability mass of all unknown targets at each location and 
dividing it by the number of these targets. 


SOP is defined in terms of some finite time horizon. 





This may be related to the endurance of the searchers (e.g., 





UAV flight time) or operational considerations. In practice, 
the time horizon may not be completely known. Looking 


further into the future with a dynamic program will give 





better decisions in the current timestep than a shorter 





look-ahead. To limit computing time and allow for a real- 


time decision aid, we only consider a two time-step look 








ahead, i.e., we set the time horizon in SOP to two. We call 





this the two timestep look-ahead problem (TTLP). The 


objective function in TTLP, which we maximize, is the 





expected number of target detections at all arcs and nodes 





visited during a given sequenc of two moves for all 


searchers. In determining the aggregate probability mass 





for the second time period, the objective function assumes 





that there are no detections during the first timestep. The 





TTLP can be solved optimally by total enumeration, but as 


the number of searchers increases, the computational effort 





increases xponentially. As a result, we constructed a 


heuristic algorithm for solving PT GP... The heuristic 





algorithm amounts to total enumeration of all solutions of a 


simplified two timestep look-ahead problem (STTLP) which we 





describe next. The mathematical formulation of STTLP follows 
in Section B. 


STTLP is identical to the TTLP except that it involves 





a simplified objective function. The STTLP objective 
function, as in TTLP, is the expected number of detections, 


but now the expected number of detections is computed 





slightly differently in the second timestep. The 
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probability mass present in the second timestep is 


calculated for each searcher independently (with no 





conflictions of moves), only taking into account probability 


updates for that particular searcher’s previous move (not 





all previous moves as in the TTLP). As with the TTLP, it is 
assumed that there are no detections during the first 


timestep. All states and arrays that are relevant to this 





update are labeled with the superscript “ND” (no detection). 











The following is an example of the flow of ASOM. After 


the initial states are established, the searchers are given 








starting locations. If there are no initial detections, 





ASOM recommends searcher moves based on the STTLP. For each 





timestep, detections are entered and ASOM reoptimizes the 





recommended searcher moves for the next timestep given there 
are no more detections. At this point, the operator can 
either accept the recommendations or enter in alternate 
searcher moves. This process is repeated for each timestep 


until the search is completed. 


B. DYNAMIC PROGRAMMING FORMULATION OF STTLP 


For notational convenience, we us e to denote the us 








of an array of all the available values for that index, thus 


Lj? 


T 
for some values X,,, then eo (x Ms Ssacgik ) : 
i,j ej J Z| 





Indices 
Tg yo kk Nodes 
m Searcher 
Timestep 
u Target 
b Types of targets 
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Sets 


RcIxlI 


QOcIxI 


Data 


DISTANCE, , 


TRANSIT, , 


SEARCHARC,, 


SPEED, 


SW. 


m 


SPEEDT, 


STEP 
START, 


Set of all available searchers, meM. 

















Set of all available nodes, i,j,kel. 








Set of timesteps, teT. 





Set of targets, ueU. 














Set of target types, beB. 

Subset of pairs of nodes (i, Jj) representing 
arcs for which there is a road connecting i 
to j, @jeR. Also, GDER, Viel. 


Subset of pairs of nodes (i, Jj) representing 





possible transit arcs between i and j, ijel. 





Distance along road corresponding to arc 


(ap gy AA “GE 2 


Straight-line distance between nodes i and j 





(mi), G/eQ. 


1 aif searcher m searches a road while on 





transit arcs, 0 otherwise, meM. 


Constant speed of searcher m (mph), meM. 


Sweep width of searcher m (mi), meM. 





Speed of target u, ueU. 





Duration of timestep (minutes). 





Starting node of searcher m, meM. 
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i,j,m 


MATRIX, , 


TTS 


i,j,u 


THRESHOLD 


TURN 


TYPE, 





Probability of detecting a target on the road 
corresponding to (i,j) for searcher m given 
that a target is on the road, (i,j)ER, 


meM . die i=j, then PD.,, =0 since 





detections at a node is determined by 


function. PDET, (7); defined later. 
Probability of a target moving onto arc from 
node i to node j, ijel. 


Target timesteps calculation, the amount of 








timesteps target u takes to travel arc 
Cee 60DISTANCE, , /((STEP)(SPEEDT,))), Gi, j)ER, 
uceU. 


An input threshold between O and 1 to 


determine what level of target knowledge will 





constitute “knowing” where a target is. 








Constant probability that a target travelling 
along an arc (i,j), G@j)ER will turn around 


and go the other way. 





The type of target u, ueU, TYPE,€B. 


The following decision variables are computed at every time 


teT. 


Decision Variables at Timestep t 


x 


i,j,m,t 


Ying 


1 if searcher m is traveling from i to j, 0O 





otherwise. 
Time until searcher m completes the 
recommended move (hrs). 
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1 if searcher m is searching, O otherwise. 


_ T 
Vin i ans ? y mut ? a5) 


Variable Array for searcher m. 


Xs = (Ke ses ’ Ver ’ on y (= Vea) 


Variable Array for all searchers. 


States at time t 











SEARCHER,,, = (is Zmj1> Yat) Vm, where 
i Current Location/Destination; 
Lif 1 if searching, 0 if transiting from previous 
timestep (Assume 1 if t=1); 
Vmt-1 Time to completion of the move from the 
previous timestep for searcher wm. (hrs) 
(Assume 0 if t=1). 
MARG, ,,,, 
Probability of target u being on arc (i, Jj). 
(i,joEeR, uceU, teT. 
MARG, ,,,, 
AG G.,, _ weU/ Imax(MARG, . 4, THRESHOLD Vi, j 


1 


ucU|max(MARG, , ,, ,)<THRESHOLD 
Aggregate probability of all targets being on 


ata (i, J) > GaeR,. ef 


S, = (SEARCHER, ,, MARG,,,,)' 


State Vector. 
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MARG*”? 


i, j,u,m,t 
Probability of target u being on arc (i, j) 


according to the viewpoint of searcher m. 


(i,jpEeR, ucU, meM, teT\{l}. 


MARG*” 


i,j,u,m,t 
ND ueU|max(MARG,Y), », , THRESHOLD ane 
AGG? = exe eee. — 3S jm 
1,],m,t » 1 


uceUlmax(MARGN? )<THRESHOLD 
eum 


Aggregate probability of all unknown targets 


being on arc (i, qa (i,j)ER from the 


viewpoint of searcher m, meM, teT\{l}. 


Si) = (SEARCHER, ,,MARG,,,,)’ 


et? 
The current state according to searcher m. 
This is only used in the future look-ahead, 


teT\{1\ , 





In the next two sections on functions and random inputs, 


parts of the formulation are not included for notational 





convenience. For a complete list, see Appendix 


Random variables and sets during time t 


D; 555 Number of detections of type b on arc (i,j) 
during time t, (i,jpeR,teT. 
Functions 
R(S,,X,) 


Reward for all searchers traveling between 





node i and j, meM, i,jel. 
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R” CS > Ve ) 
Reward for searcher m traveling between node 
i and j, meM, i,jel. This function is 


only used in calculating the future reward 





when there is only knowledge of the searcher 





m. 

PDET, ,, (7) 
Probability of detection at node i by 
searcher m, dependent on amount of time 
searched,t, iel, meM. 

NEGATIVE, ,,(S,,X,) 


1, j,u,t 
Function to update probability maps for 


failed detection via Bayesian updating. 





NEGATIVE™ (S.,V,,,) 


Function to update probability maps for 
failed detection via Bayesian updating for 
look-ahead. Heuristic approach only takes 
into account the move of searcher m. 


MARKOV, ,,,(S,) 


Function to update probability maps for 


target movement based on Markov matrix 


MARKOV,?. (S%?) 


1, j,u,m,t m,t 


Function to update probability maps for only 











the movement of target m. It is used in the 
calculation of the “no detection” marginals 
according to searcher m. 


POSITIVE, ,,,,(S;sD..01) 


i,j,u,t 


Function to update probability maps for 





positive detection via Bayesian updating 
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Policy: Set X,=X,, where (X,,X,,) is the optimal solution 


of the simplified two timestep look-ahead problem (STTLP) : 


ND ND 
max R, (S, ? X,) + RM are ? Sta) 
m 


XX 

Subject to: 

Dia aoe ee 
mi m'eM\m 





(Do not allow overlapping of moves) 
Sagal, Nig 


(Max one searcher per arc at time t¢t) 


Seas! Vi, J 
m 





(Max one searcher per arc at time t + 1) 


ae <1 Vm 
i,j 





(One move per searcher at time t) 


De Rice Vm 
i,j 








(One move per searcher at time t + 1) 


If searcher mis at node i at time t, then: 
aes =1 Vm 
j 


(Must start at the starting position) 








End if 
> Xi jmt a xt vm 
(Gi, jeR 


(Tracks transiting/searching at time t) 


y Xi iamt+l 2 xm t4l vm 
(i,j)eR 


(Tracks transiting/searching at time t + 1) 


21 





Vm, If y,, <STEP, then: 


» Xi jmt = > X i km,t+l VJ 
i k 
Else: 


aoa = Xj jms+l VJ 


L 


End if 


(Continuity of route) 





es > (ns (DISTANCE, ,z,,, + TRANSIT, ;(1-z,,,))) 











ee: Vm 
“STEP SPEED,, 
Else If t2>2, then: 
DISTANCE, ,Z,4 + 
cit?" | TRANSIT, , (1-2, ) 
Ye = MAX) Voy — STEP, Vm 


SPEED, STEP / 60 


(Keeps track of timesteps until searcher m is 





available) 


End If 





X; ime € {0,1} Vi, j,m 

Xs, cme © {054} Vi, j,m 
Zi jms © {Ool} Vi, j,m 

Zi jmast © (Oo 1} Vi, j,m 
eo) vm 


Vee O Vm 
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Dynamics (Given S, and X,) 


Vim, if Sige ey phen: 


SEARCHER 4, = (isZnus Ym)” 


Sets the searcher’s state to the decisions of 








that searcher for this timestep. 























End If 

MARG, (4,141 = MARKOY, ,,,, (NEGATIVE, ,,,, (POSITIVE, , ,,,(S,5Dy01)>X 1) Vi, Jiu 
Updates the target marginals for the positive 
detection updates, the negative detection 
updates, and the movement of the targets 
based on the Markov process. 

MARG) ys, =MARKOV™ (NEGATIVE®” | (POSITIVE, ,.,,,(S,,ZEROS,,.)sVj)) Vis jum 


ZEROS denotes a matrix of zeros as input for 











the detection matrix, or “no detections 
found” in human input terms. The update only 


has knowledge of one searcher at a time, thus 





it calculates marginals 


S,,, = (SEARCHER, ,,,, MARG, ,....)" 


ett+l? 


SN? =(SEARCHER 


m,tt+1 


MARG?? nist) 


m,t+1? o,e 
Sets the regular and no detections state 


variable arrays. 
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III. IMPLEMENTATION 


A. MODEL IMPLEMENTATION 


We implement the model in MATLAB version 7.0.1 and 


carry out all computational tests on a NES computer with a 





1.83 gigahertz AMD Athlon XP processor and 512 megabytes of 
RAM. As described earlier, we implemented a heuristic 
solution to the TTLP, called STTLP. The code is written in 


many sub-functions so that a single aspect of ASOM can be 





changed without having to go through th ntire code. The 


descriptions of our MATLAB functions are given in Appendix 














B. HEURISTIC ACCURACY 


The only straightforward method for ensuring that 


optimal searcher moves are chosen is total enumeration. The 








difficulty with total enumeration is that for every searcher 








added to the TTLP, the total number of searcher move 











combinations increases xponentially. Thus, we need the 











heuristic algorithm, STTLP (see section B in Chapter dos 








We compare our heuristic with the total enumeration approach 





in terms of runtime and accuracy to ensure it provides 





ffectiv recommendations and that its speed improvements 
are worth sacrificing optimality. For one, two, and three 


searchers we create random target marginals, randomly place 











the searchers, and compar th moves recommended by our 





heuristic and total enumeration functions. We allow 


searchers to be “initially blocked” with a probability of 








s20ie Here, “initially blocked,” means that the searchers 
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are constrained in their movements from the previous 


timestep (i.e., still in transit). This .25 probability 











represents the fact that during a normal run of our decision 


aid, the searchers make direct transits that require two 





timesteps and are blocked from making a new move for one 











timestep. 


Table 1 shows the accuracy results of the heuristic for 


1000 simulation runs. The accuracy iS a ratio of the 





probability mass collected by the heuristic versus that 








collected by the total enumeration approach. It also 


displays the fraction of time the heuristic returns the 





optimal move. The “Within One Move of Optimal” column gives 





the fraction of time that the heuristic moves did not match 





up with the total enumeration moves for at most’ one 











searcher. Table 2 displays the runtimes of the heuristic 


and total enumeration approaches for one, two, and three 





searchers along with their 95% confidence intervals. 


Table 1. Heuristic Accuracy Table. 


Within One Move of 
Number of Searchers|Accuracy| Returns Optimal (TTLP) Move Optimal (TTLP) 
1 fee 


1 
0.944 0.985 
0.843 0.934 


Table 2. Heuristic Runtime Table. 


Total Enumeration (TTLP) 
STTLP Runtime (sec) Runtime (sec) 


.02165 +/- .00074 01462 +/- .00074 





.04219 +/- .00093 .8560 +/- .046 
.07381 +/- .0076 64.21 +/- 4.45 
.1046 +/- .00227 4186 (estimated 
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C. EXCEL INTERFACE 





The Microsoft Excel Interface was developed by Mr. 
Anton Rowe. Figure 4 is an example of the output display in 


the user interface. 





Figure 4. Screenshot of Excel Interface. 





> 











Recommend 


Move 


Searchers 


Detections 


a 
\ Buster 


ScanEagle * 


[iststi onary 








By 
ii 
[ai] 
| 4 | 
ee 
mea 
fA 
| 9 | 
| 10 | 
ET] 
[e] 
[Hel] 
14 | 
15 | 
| 16 | 
[| 
| 18 | 
19 | 
| 20 | 
ti) 
[zz] 
| 23 | 
| 24 | 
| 25 | 
| 26 | 
Bal 
| 28 | 
| 23 | 
| 30 | 
31 | 
Bai) 
[Eel] 
| 34 | 
| 35 | 
| 36 | 


«> »\Dashboard / Searchers / Detections (Targets / Positions Roads f Options £ Road Distances / Direct Distances Target Movemel 














In Figure 4, the red circles represent all possible 





nodes and the red triangles represent all possible roads. 


The different sizes of the circles and triangles represent 





the aggregate probability of finding targets there. The 








solid blue boxes represent the different searchers at their 


current locations in this scenario. The blue lines and 





outlined boxes represent the recommended searcher moves for 





the current timestep. If a triangle is encased in the 
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outline of a blue box, 


search the road to the corresponding node. 


this means the recommendation is to 


A dotted blue 





line going straight to a node means transit directly to that 





node. 


a transit route, 


this means the searcher will not get to 


If there is an outline of a blue box in the middle of 


the 





designated node in one timestep and thus it is a directed 


move for the following timestep as well. 








stationary (zero speed) then th 





always be to stay at the same location, 





outline around its current position. 


Raven is transiting from node 3 to 6, 





timesteps to reach node 6. Buster is 


from node 2 to node 8 (one timestep) 





transiting from node 11 to node 9 


Th 





re ar several required inputs 


parameters for both searchers 


available searcher, the name 


interface) should be provided, as 


sweepwidth, a binary entry for 


moveable camera capable of searching 


straight line distances, and the 


example input is seen in Figure 5. 


stationary searcher in the scenario below, 


a searcher with speed equal to zero. 


must also be provided, but the “Sweep” 





for a stationary searcher are not used. 
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well 
whether 


roads 


and 


recommended move 


If a searcher is 


will 


shown by the blue 


but will 
searching 


and Scan 


(one timestep). 


for ASOM 


and targets. 


as 
the 


whi 


starting position. 


which i 


WAC 


the 
UAV has 


In the example above, 


take two 





the road 
Eagle is 
including 


For each 


(as it will be displayed on the 


speed, 
a 
le flying 


An 


Notice there is also a 


s input by 





A starting position 


categories 








Speed Sweep Initial Last Current Next 
Scan Eagle 
Buster 


Raven 
Dashboard Stationary 





The available targets are simple inputs of the expected 
number and type of each target that will be available in the 


scenario. For each target, a speed and type must be 





provided, as seen in Figure 6. If the number of targets is 
not known, a reasonable estimate should be provided; the 


better the estimate the more accurate the model will be. 








Figure 6. Example Target Input. 


Se eee eee 


1 


Xe)" Targets 
ra 


| 4 | Type Speed 


ae 

| 8 | Dashboard 
fea 

ca 
Detections 


8) 
rere 








Detections are input during the current timestep of a 








model. The key feature here is the “Recommend” button. 


When pushed, this button gives recommendations based on the 








current state. If, however, detections are made between 
then and the end of the timestep, they can be inputted to 


update the state and a new set of moves will be outputted. 
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An example timeline of entering detections and moving 


targets can be seen in Figure 7. 


Figure 7. User steps in ASOM. 


Network, searcher, & target : 
information initialized : Get recommendations 
Reset decision aid Detections? 


Enter initial detections 


(if any) 


Get recommendations 








Detections are inputted with four parameters: (1) 
timestep of the detection, (i1 and iii) perceived starting 


node and ending node location of the target, and (iv) 





detection type. The starting and ending node location 
together represent the arc (i,j) (location) in which the 
target was detected, where if i=j, the target was detected 


stationary at node i; and if i#j, the target was detected 
on the road going from node i to node j. An example of 


what the target detection sheet might look like at timestep 








5 can be seen in Figure 8. In this example, the first line 
says there was a detection of type 1 on the road from node 2 


to node 8 at time 1. Similarly, the second line says there 





was a detection of type 2 stationary at node 5 at time 3. 
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Figure 8. Example Target Detections. 





Dashboard 





Additional data for ASOM include the latitude/longitude 
of the nodes, data for the roads (start/finishing nodes, 
length of the road, and latitude/longitude position to 
display the red triangle representing probability), direct 
distances between nodes (as a UAV can fly them), and the 


Markov movement matrix for each target. 
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IV. FIELD EXERCISES 





We performed two field experiments in February and May 





2008 at Camp Roberts, California using multiple Raven and 





Buster UAVs. 


An important part of ASOM is the ability to take into 





consideration the needs of the operator and the possibility 





to react to unexpected situations. Several features of ASOM 


would not exist if we did not field test the decision aid 





and receive feedback from UAV operators. This allows ASOM 


to handle realistic scenarios in multiple environments. 





A. FEBRUARY EXPERIMENT 


The purpose of the February experiment was to test a 





preliminary version of ASOM and make sure the results passed 





a reality check. A secondary purpose was to see what could 
be improved in the underlying code and what changes were 


necessary to make ASOM run smoother. Ther wer several 








weather restrictions that limited th xperiment, but 





overall the objective of the experiment was accomplished. 


We ran our preliminary model with 5 moving targets 
(cars) traveling at 25 miles per hour and three searchers: 


one ground team, one Raven UAV, and one Buster UAV. ASOM 





isolated the possible location of the targets to one side of 





the map, as seen in Figure 9, and was correct in its 


judgment of possible target locations. In this preliminary 








version of the model, aggregate probability is given by a 





color scale rather than a size, with green representing the 


lowest probability, fading to yellow, then finally to red 








representing the highest probability. The nodes are still 
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represented by circles, but the roads are represented by 





straight lines between the nodes. 


Figure 9. February Experiment Final Probability Map. 


Joint Target Distribution 


ee Highest 
Current Positions: 


Buster: 27 
Raven: 8g 


Recommended Movements: 


Buster: 77 ~~ Road Search 
Raven: 89 Stay at Location 








There were several important lessons learned from this 
experiment. The first stemmed from the fact that our 
approach was greedy in its search patterns. At this point, 
the searchers appeared to find a target and track it because 
this resulted in the largest reward while sacrificing 
knowledge of the other targets. This is not optimal if the 
objective is to maximize total knowledge of the system. We 
remedied this by creating the threshold input. As described 
earlier, this is equivalent to saying you “know” where a 
target is located if its maximum probability mass at any 
location is greater than the threshold probability. 
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Another change in ASOM was how to make the model more 


user-friendly than the current MATLAB code and input 





techniques. This was handled with a new Excel interface as 





discussed in the previous chapter. The usefulness of the 





interface is discussed in the May experiment section. 


B. MAY EXPERIMENT 


The goal of the May experiment was to test the updated 





code, which included the target threshold constraints to 








discourage a greedy policy which tracked detected targets. 


We implemented the Excel interface for the first time and 








evaluated its utility and functionality. The experiment was 








run with four targets (again, cars traveling at 25 miles per 





hour) and thr searchers, one Buster UAV and two Raven 


UAVs. 


The first day’s trials led to the creation of the 





disabled node. This node is an abstract location where 





searchers are placed when they ar refueling, damaged, or 








unusable. This allows ASOM to function in a larger set of 


scenarios as well as take into account unexpected events 





where a UAV becomes disabled. For example, in the first 
trial, the Buster UAV lost contact, deployed its parachute, 
and was unable to continue its search. The Raven UAVs also 
ran out of gas sooner than expected and had to land and 


refuel, thus cutting the experiment runs short. 


The second day’s trial utilized the disabled node 


update. This trial was extended to a nearly three hour 





scenario where UAVs were forced to refuel, thus testing the 


capabilities of the disabled node. 
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Figure 10 shows the locations of all of the targets and 


searchers as well as the color of th 
The green and tangerine 


target detections by the searchers. 


possible fail 
searcher or 
location were 


detection. 


each at the same location, 





at that time. 


probability 





interval 


small, the 


(Os46 i= Oe 20)% 


confidence 


ed detections, 
target leaving 


close, 


A red box means a target 





vehicles detected. 








colored boxes 


meaning 
and the 
but there could 


represent actual 


Yellow boxes represent 





the timing of the 
other arriving on 


have been a failed 





and a searcher wer 


but there was no detection made 


From this, we calculated an estimate of the 


of detection with appropriate 95% 


interval 





detection is very wide. 





better estimate on the ac 


these UAVs. 
derived from 


time as well 





In ASOM, the 





L as searcher 





generally high 
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Since the dat 


confidence 


ta set is relatively 








In any case, 


probability 


the probability of 
this might give us a 


tual probability of detection for 


of detection is 


the random search formula and is dependent on 
characteristics, but it is 


er than the above empirical estimate. 











May Experiment Detection Results. 
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Figure 10. 
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Failed detections could stem from any combination of 


incorrect 


The searchers were at 


three sources of error. 


Or our 


the targets were at incorrect locations, 


locations, 


searchers 


estimation of the probability of detection for 


The problem of searchers 


finding targets was inaccurate. 
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being at wrong locations seems unlikely becaus they are 


given GPS coordinates to fly to, and their locations are 





displayed on a screen. It is possible the targets (who were 











people driving around in cars) did not know the Camp Roberts 





map as well as we had hoped and were actually driving to 


wrong locations. The most likely source of error was that 








the camera feeds on the UAVs were scrambled enough that the 


operators had a hard time identifying targets, thus lowering 





our probability of detecting a target given a searcher and 


target were at the same location. 


One other interesting aspect of having a long trial 











versus several short trials is a measurement of the 
situational awareness of the searchers. Specifically, the 
awareness of target location went in cycles. Examining 





Figures 11 and 12, the first is a picture showing UAV 


locations and target location probabilities half way through 





our second day’s trial. The searchers appear to have locked 


onto the locations of the four targets. The second figure 





shows the end of the scenario where the searchers have some 
idea, but not as good as the previous screenshot. This 


shows that searcher knowledge of target location went in 





cycles; the searchers had the targets pinned down, then the 


probability mass spread out, and eventually the searchers 





would pin down the targets again. This could also be 





explained by a high estimate of the probability of detection 





because it would eliminate too much mass from a location 





that was just searched when there should still be a 








significant probability mass at that location. If this 


estimate were lowered, it would take longer for the 








searchers to isolate the target location, but it would be 
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more accurate and unlikely to go through the cycle of target 


knowledge that was experienced in this trial. 





Figure 11. Mid-Scenario Probability Map. 





Buster 
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Figure 12. May Experiment Final Probability Map. 


Raven2 





The second day’s trial was markedly improved. The 


small problems we experienced in day 1 were fixed for day 2 











and the long trial ran smoothly. During the trial, the UAVs 


operated without any mishaps. The disabled node was used 
for refueling purposes and worked according to plan. The 
results from day 2 were informative and the Excel interface 








made ASOM easier to understand, even for the people 





observing the xperiment. After implementing the target 
thresholds, the searchers were able to concentrate their 


efforts on finding targets whose location probabilities were 
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spread out. The behavior of the searchers when they did not 





concentrate on searching nodes with recently found targets 


resulted in a noticeable improvement of situational 





awareness when compared to the greedier ASOM. Even after 





these updates, there are still a few recommendations for 


future work on ASOM. 
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V. FINAL THOUGHTS 


A. CONCLUSIONS 


We have created a decision aid that recommends 
efficient search plans for multiple UAVs searching for 


multiple moving targets, possibly of different types. This 





decision aid demands few assumptions concerning the desired 


search scenario. ASOM is general enough to support many 








military or civilian search situations. It can be used to 
search for terrorists moving between safe-houses) and 
friendly pilots who have been shot down in a wooded area. 


On the civilian side, it could be used for search and rescue 





missions after natural disasters or to search for lost 
hikers in the mountains. ASOM can also incorporate 
stationary searchers or targets and can even keep track of 


different types of targets. The decision aid is capable of 





being altered for a greedy search to keep track of targets 





once they are found, or to go after other targets that have 





not been found in a while, or at all. 


Today, UAVs are increasingly used in combat situations. 





Their importance in future warfare will continue to grow and 





they are likely to become more important in many different 





Civilian applications. Creating efficient search plans for 
these UAVs is the problem we chose to solve, but there are 


many other topics involving efficient UAV routing. There is 





a necessity for work such as that seen in this thesis and 
the importance of such work guarantees many different 


avenues for future research in this area. 
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B. FUTURE WORK 


Currently, there are several aspects of ASOM that could 





be improved. Firstly, we did not take the wind speed and 


direction into account when determining flight times for 





UAVs to reach destination nodes. This update would involve 
creating a dynamic set of distance matrices that vary with 
wind speed and direction. This will make the calculations 
of arrival and search times far more accurate than the 


constant distances that we used in the calculations. While 














the wind factor is a relatively simple change to the model, 


it will dramatically increase the accuracy based on the 





amount of work required. 


The second change would be to do some more calculations 








and experiments to get better estimates on the probability 





of detection for different UAVs. The values we used wer 





estimated on past experience, but we believe them to be too 











high of an estimate. If more research was completed and 


better estimates found, again the accuracy of the model 





would be increased with a relatively small amount of work 


required, albeit somewhat time-consuming. 





The third change would require a bit more programming 
experience, but in the end, could create the most accurate 
decision aid. This change would be to try and do more than 
the two-step look-ahead problem. Not to look further into 


the future, but to create an expected future reward based on 














the current state after the two-step look-ahead. This would 
be a way of estimating any further look-ahead based on the 


state as there are diminishing returns on looking further 





into the future and the computation time increases rapidly. 





This expected reward on future searches based on the state 
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is a good way to avoid the problem of computational 








complexity, yet get a more accurate solution. 


A fourth possible change would be to try and 





incorporate target dependenc into the model. Currently, 








the model assumes independent movement of the targets. This 


assumption makes computing the marginals based on movement 





from the Markov process easier than if the targets’ 





movements were dependent on one another. Getting rid of 


this assumption would be a somewhat difficult task as that 





part of the updating phase would have to be reconstructed, 


but it would be a great way to extend our work on ASOM. 


An extension to include different scenarios is to 


examine the possibility of tracking criminals after a 





robbery along city streets. In this scenario, searchers 





would first concentrate their search around the robbery 
location, but as time increases the graph of nodes and arcs 


would be forced to expand to represent the criminals getting 











away. There could even be an “escape node” to represent the 





criminals getting out of the area or exceeding the time the 


police are willing to search. 
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APPENDIX I: ADDITIONAL EXPRESSIONS FOR FORMULATION 


Random Variables and Sets 
t. Random variable with a uniform (0,1) 
distribution. 


The following random data sets, DET,, C,, and COMBO,,, are 


t 





used in the calculation of target detections. ASOM receives 


all of the detections as inputs during time t. ASOM must 





then determine the probability of each different possible 
scenario of detections occurring as explained in the model 
formulation. These calculations are handled by appropriate 


functions below, thes ar th random sets required for 





those calculations. 


DET, , Set of the number of detections of type .b at 


time t, beB, teT, DET,,={1,2,..,)D,j5,}, 
ij 














d € DET,,. 
C,, Set of the number of different permutations 
of target detections of type b at time t, 
beB, teT, C,,={12....|U|Y((U|-|DET,,|), ceC,, . 
COMBO, ..,, Matrix of the different permutations of 
target detections of type b at time t. 








Detection number d of permutation number c 


during time ¢, ceC,., deDkEl,,, DEB, tef . 
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Model Formulation Functions 


R(S,.X,) = Y (AGG, 5, maZingPD3, jm + AGG PDET,,,(max(0, STEP - y,,,))) 


ijt i,j,m,t~\m,t jit Xi, fmt 
i,j,m 


Reward for all searchers traveling between 





node i and j, meM, ijel. The reward 
function is an important part of the model 
because it is what the model intends to 
optimize by changing the possible decision 
variables. 


RS. Vig= LAGE So PD, + AGG"? PDET,,,, (max(0, STEP — y,,,))) 


m,t ? 1,J,m,t Xi jm Sint 1,7,m JsJ,m,t Xi, jm,t j,.m 
Reward for searcher m traveling between node 


i and j, meM, ijel. This is the function 





used for the future reward where the state 
will depend on the previous moves of just one 


searcher. 


—1SPEED,,SW,, 


PDET. (7) =l]—e 7.0625 


Probability of detection at node i by 





searcher m, dependent on amount of time 
searched,T, iel, meM. This is. the 
function used to determine probability of 
detection at a node, rather than on a road 


(handled earlier in the data section). 
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TOTAL, (S,,X,) =) iim{ MARG, j,,,%;,jm,(1- PDET,,, (max(0, STEP - y,,,))) 


NEGATIVE, 


i, ju,t 


NEGATIVE” (S, 


(S,,X,) = 


m,t 


MARG, 0 %i,jms%mgt (1 a PD, jn ) + 


max(MARG,, ,,)<1 
i Vu 


jm 
1 otherwise 


Sub-function Of NEGATIVE, ,,,(S,,X,) and 


i,jut 


NEGATIVE™? | (S,,Vn;) « Te represents the 





normalizing factor, meaning it is the sum of 





all the posterior probabilities after a 








Bayesian update. If the variable array input 





is for a one searcher m (as with the input 





Vs), the summation over variable m is only 


m,t 


over the single input value m. 








MARG, ,,, C -[[(1-PD,,., ) 
i ix j 
TOTAL,(S,,X,) a5 
Vi, jiu 
MARG, ,,, fi -[](1- PDET,,,(max(0, STEP - y,,, »)| 
eee, 
TOTAL,(S,,X,) : 


Function to update probability maps for 


failed detection via Bayesian updating. 





Takes the posterior probabilities and 





normalizes by dividing by the sum of all 


posterior probabilities. 





MARG,,,,, fi -T](1-PP, in ) 


m 


TOTAL,(S,,V,,,) 





Vojd= Vi, j,u 


MARG, C -[](1-PDET,,,(max(0, STEP - y,,, »)| 


m 


TOTAL,(S,,V,,,) 





Function to update probability maps _ for 
failed detection via Bayesian updating for 
look-ahead. Heuristic approach only takes 


into account the move of single searcher m. 
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TEMP, ,,,(S,)=MARG, ,,,MATRIX,, Wi, j,u 


i,j,u,t i,j,u.t 


Sub-function of MARKOV, ,, ,(S,) and 


i, jut 





MARKOV? (S*?). It represents the probability 
m,t 


1,j,um,t 





that a target at node i will remain at node i 
for the next timestep. 


MARG, ,.,, . 
TEMP2a,,,,(S,) =>. Vi,u 
= jizj MAX(TTS, ;_,.1) 








Sub-function of TEMP2, ,,,,(S,). It represents 


the additional probability each node will 


accumulate for the next timestep by the mass 





coming in from all adjacent roads. 
TEMP2a,,(S,) i=j 
TEMP?, ,,,(S,) = ian) : ‘ Vi, Ju 
ie 0 ix] 
Ssub-function Of MARKOV, ,,,,(S,) and 
MARKOV}* ng (Smt) * It extends the previous 





i,j,u,m,t 
function, TEMP2a,,(S,), to account for the 


fact that only nodes, not arcs, have this 





property. 
(1-TURN) max(TS, ,,, -1,0)MARG, ,., a 
TEMP3, ,,,,(S,) = ze 2 Vi, j,u 
aa max(7TS; ;,,.0) 
Sub-function of MARKO, ,,,,(S,) and 
MARKOV.™, (Sv). It represents the probability 





i,j,u,m,t 


of target on arc (i,j) deciding to continue on 





that arc with (I-TURN) probability. 
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TURN max(TTS ,,, —1,0)MARG, = 
TEMP4, ,,,,(S,) = i a Vi, ju 
i max(7TS , ; ,,,0) 
Sub-function of MARKO, ,,,,(S,) and 
MARKOV,"”,..,,(Sn;) + It represents the probability 


MARKOV, , ,,(S,) = | 


i, j,u,mt 


MARKOV,™.,,,(S~”) -| 


of “a target on. are @,j). deciding to “turn 
around with TURN probability. 
TEMP, 


i, jut 


(S,) +TEMP3 


(S,) +TEMP2, ,.,,(S,) info, 
TEMP, (S,)+TEMP4,,,,,(S,) i# j wee 


i, j,u,t i, j,u,t 


Function to update probability maps for 





target movement based on Markov matrix. It 


incorporates all sub-functions to take into 





account for all probability mass leaving and 
coming into arc (i,j). 

TEMPA, , (Sin. ) + TEMP2, «(Sinz ) ied 
Sw?) +TEMP3, (Sx; ) + TEMPS, (Se?) iF 


st i,j,u,t m,t 


Vi, j,u,m 
TEMP, 


al i,j.u,t 


Function to update probability maps for 


target movement based on Markov matrix for 





the second step look-ahead. It incorporates 





all sub-functions to take into account for 











all probability mass leaving and coming into 





are); Fy The only difference between this 





function and the normal MARKOV function, is 
that this one is performed for each searcher 
m, and the current input of the state will 


only be updated for the moves of searcher m. 


a 


ij Peeey 


MARG, . D, <d< D 
a i, j,COMBO, ¢. 5 pst K1,K2,D,t - KL k2,bst er 
PRI, c¢(Sp> Dros) = cd Kl=l k2=1 kl=1 k2=1 Vi, j,d,c 
1 otherwise 
Sub-function of PRI 4 AS 5D ings) s it 





calculates the probability of seeing a 


particular target of a particular combination 





c for detection d of that combination for 


each arc (i,j). 


PR2, ,.(S, ? Dy oy4) = I] PRI, aes (S, ? D, 455) Vi, ds c 
d 





Ssub-f£unctvon Of (PRIA) Do) + Tt. determines 





the total probability of seeing all 
detections of a particular combination for 


each arc(i,j). 


PR3.,(S,,D..5,) = | | PR2 


i,j 


(SP o.5e) Ve 


i,j,c,u,t 





Sub-function of PR4,,,;D,,,,)+ It determines 


the total probability (no normalization) of 











seeing each combination of target detections 


by multiplying over i and j. 


PR3; Seg) 
PRA AS Ds, ee (NG 
- »y PR3, 5 dus (Si 5) 
cleC, 
Sub—funcr Ton od: CHOICE, ..(S,,Dyos4) + It 





normalizes the calculation of PR3,,(S,,D,.,,) - 
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cal c 
COMBO,. PR3,,,(S,,D,.4,)<  < > PR3,,,,(S,D.., 
CHOICE, .., (S,,D..5;) = we 2, cS Pron) <S 2, taal os) Vd,c 


0 otherwise 
Sub-function of CHOICE2,,(S,,D,.,,), ¢ denotes 


a random number drawn from a uniform(0,1) 





distribution. It determines the actual 





scenario of target detections that occurred 





according to the model based on this random 
draw and the probabilities of each scenario 
occurring by setting all other combination 
values to 0. 


CHOICE2,,(S,,D 


°,°,b.t 


y=) CHOICE, (S;,Di5,) Vd 


Sub-function of POS1, 


i,j,d.u,t 





(Sega) . BEAGebs. id 


of all other combinations except the values 


of the one that actually occurred. 


i j-l u J 
Denon, <d SY ¥ Daan JOE 
k k2=1 


1=1 k2=1 kl=1 


POS], ,,,;(S,,D..5;) = Vi, j,d,u 
SDE pea (u = CHOICE2,,(S,,D..5,)) 
0 otherwise 
sub-function. of POS2,. (SDs) = It stores 





the value 1 for all locations that a 
detection occurred at arc (i,j) for target u 


and detection d and zero otherwise. 


POS 2,54 (Sis Deena) = Y, POS]; jag (Ss Downs) Vi, j,u 
d 


Sub-function of POSTYPE1, 


i,j,u,b,t 





GaP s Leases 





over the detection number variable so we have 
a 1. if sa detection. Gecurred on <are :@,j) for 


target u. 


53 


MARG, ,,, >. POS?) a34S..Ds gual 


POSTYPEL, 5.45:(S;> Door) = faltel Vi, j,u,b 
POS2, 5 44(Si> Done) otherwise 


L 


Sub-function of POSTYPE2, 


i,j,u,b,t 


(S22Don ys Tt 








sets the value equal to the marginal value if 
no detection occurred and 1 if a detection 
did occur (to spike the probability) for each 
target type b. 
POSTYPE1, 


i, j,u,b,t 


(S,,D...,) TYPE, =b _ 
eae Vi, j,u,b 


POSTYPE2, «3 (S.D.:..)= 
iid S11 Dono) | 0 otherwise 


Sub-function of POSITIVE 


i, jut 





Oe age It sets 





the target marginals to the correct values 
only if the current target being looked at, 


u, matches the current type, b. 


POSITIVE, 


i, jut 


(S,, Dye) Vi, J; Uu 


i,j,ub,t 


(S,,D....,) =, POSTYPE2 
b 


Function to update probability maps for 








positive detection via Bayesian updating. It 








sums over the probabilities for different 


types of targets. 
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APPENDIX II: MATLAB FUNCTION DESCRIPTIONS 


A. STEP .M FUNCTION 


This function is the main workhorse that runs the 








algorithm. It does all calculations, either inside the 








function, or calling other functions to do the work for it. 








It first updates the target marginals by running the 


positive Bayesian updates (detections) for different target 





types (PositiveBayesianPermutations.m). The function then 











makes all essential updates to the probability of detection 
at each arc (i,j), including the nodes and connecting arcs 
for any stationary searchers using the locations of each 


searcher. After these steps, the function updates the 





target marginals for negative Bayesian updates 





(NegativeBayesian.m), the traditional application of Bayes’ 





theorem. Next, the function updates for target movement 


from the Markov process (MarginalsMovement.m) to account for 





the fact that targets could have moved during the current 








timestep. Finally, the function determines which moves to 








recommend for the next timestep with the current state and 





detection matrix (MultiSearcherMove.m). 





B. INITIALIZEMARGINALS .M FUNCTION 


This function only serves a purpose for the actual 





experiment. It is a way to initialize the target marginals 





before an experiment begins. It takes as an input, the 








number of targets that are going to be involved in the 








experiment and returns the resulting initial target 


marginals. For our experiments, we assumed a target was ten 


5D 





times more likely to start at a node than on a road, but 





this value is completely dependent on the conditions of the 





scenario. 


The function calculates these initial conditions by 


creating an integer count on each arc (i,j) to represent how 





likely it is to start there. Giving a value of 10 to each 











node, 1 to each road, and zero at every other (i,j). It then 
divides by the sum total of the entire matrix to convert 
these counts into probabilities. Finally, it sets these 


probabilities for all targets. 


Cc. AREASEARCH .M FUNCTION 


This is a simple function that determines’ the 
probability of detection at a node given the time spent 


searching at the node as well as the speed and sweepwidth of 





the searcher. It does this by using the random search 


formula assuming a circular search area of radius one- 





quarter mile around the node. We assumed a random search to 


calculate a lower bound on the actual probability of 





detection. This function is used in th SearcherMove.m 


function to help determine how much probability mass would 





be collected by a certain move. 


D. SEARCHERMOVE .M FUNCTION 


This function takes in the state and characteristics of 


one particular searcher as well as a list of nodes not 





available for this searcher at this time. It returns the 





searchers best first and second moves (second move refers to 
the move in the next timestep, which will be reoptimized 


based on the actual state during the next timestep), as well 
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as how much probability mass these moves collect and whether 





this sequence of moves takes both timesteps, thus 


constraining the options for the next timestep’s move. 


The function works by looping through all nodes and 


checks which ones the searchers are able to transit or 





conduct a road search to during the next timestep. Tt 





accomplishes this by using two nested “for” loops. It then 





updates the target marginals with a negative Bayesian update 





function, thus inherently assuming no detections were made 





in this timestep in order to get a more accurate estimate of 





the state for the next timestep (this assumption is not made 
during the reoptimization of the future move, it is merely 


made now for a more accurate representation of the future 














state). The function then uses two more nested “for” loops 
inside of the other two to calculat very sequence of two 
moves (still including the option of either transiting or 





searching the road) and determines the reward of doing such 





a sequence of moves. If the sequence of moves the function 





is currently examining is better than any previous sequence, 

















it stores these moves as the current best. It then repeats 








this process until all moves have been checked. 


E. MULTISEARCHERMOVE .M FUNCTION 


This function takes in the number of searchers and 





their characteristics as well as the state at the current 








time. It returns th recommended move for the current 








timestep for each searcher and whether or not that searcher 
will be blocked (constrained to continue along that search) 


for the next timestep. 


a 


The function accomplishes this by repeatedly calling 





the SearcherMove.m function with different restrictions for 





each unconstrained searcher (searchers can be constrained if 





their previous move limits their next move, i.e., they are 
still en route to their previous destination, or if they are 


currently inactive, i.e., out of fuel or down). The 





function first limits constrained searchers to their 





appropriate moves and then updates the restricted movement 











list to incorporate these moves. It will get the optimal 
move for each searcher by running th SearcherMove.m 
function and storing these optimal moves. If there are no 








conflictions, these are the optimal moves for the searchers; 


if there are conflictions, the function will then update the 





list of unavailable moves for each searcher and determine 











the best scenario possible using these conflicting 
searchers. It will repeat this process until there are no 
conflictions among the searchers and this will be the 


recommended movements for the next timestep. This iterative 


process of eliminating possible moves and recalculating 





optimal moves for each searcher can save orders of magnitude 
in runtime over the total enumeration method for all 
searchers combined which tries many moves that are nowhere 


near optimal strategies. Even in the TTLP, total 





enumeration for a real-tim xperiment can take too long, 





thus this iterative optimal move process is an extremely 


important process of the ASOM algorithm. 


F. POSITIVEBAYESIANPERM.M FUNCTION 


This function takes in the current target marginals and 





a matrix of all the detections. It returns the resulting 
target marginals after updating for the positive detections 
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in the current timestep. It is only appropriate to use this 











function when all targets are of the same type, the mor 


general type of this function and the one that is used in 





practice is PositiveBayesianPermutations.m. 


This function works by creating a matrix of all 








different (unordered) combinations of targets that could 





hav been seen during the timestep using the nchoosek.m 





MATLAB library function. Next, for each different 
combination (each row of the previously created matrix) it 


creates all different permutations (ordered) of that 








combination using the perms.m MATLAB library function. It 


combines all of these different permutations into one big 





matrix of all possible permutations for the target 











detections of the current timestep. It is important to 
notice that these permutations represent all of the 
different possible scenarios of target detections. The 


function then determines the probability of each of these 








scenarios occurring by multiplying together the target 





marginals of each detected target at the location it was 








supposedly detected then normalizing by dividing each 


probability by the sum total of all probabilities. After 








determining and normalizing the probabilities, the function 
decides which scenario actually occurred (according to the 
model/algorithm’s viewpoint) based on a random number draw. 


Now that the algorithm has the scenario that occurred picked 








out, it updates the target marginals for all targets that 





were detected to be one at the arc they were detected and 








zero everywher lse, thus spiking the probability of those 





targets. 
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G. POSITIVEBAYESIANPERMUTATIONS .M FUNCTION 


This function takes in the current target marginals and 
an array containing the information of each target type, as 


well as a list of all detection locations and the type of 





detection made at each location. It returns the resulting 





target marginals after all positive Bayesian updates have 





been made. 


The function works by creating new temporary target 


marginal matrices with an extra index representing all 





possible types of targets. This will create many blank (by 








blank, we mean no nonzero ntries) levels of the target 





marginals of each type, as there will only be nonzero 





entries if the target type of the marginals index matches 











the actual type of the target. In a Similar manner, the 


function also creates a temporary detection matrix with an 





extra index to indicate detections of a certain type of 





target. Next, the function calls the PositiveBayesianPerm.m 





function for each type separately, meaning where the 





PositiveBayesianPerm.m function is expecting the input of 








the target marginals and a matrix of detections, we only 


give it one level of the temporary target marginals and 





temporary detection matrix by holding the type index fixed 


at its current value and looping through all possibilities. 





This updates the temporary target marginals for each type 





separately, but since all values wer Zero xcept for 


targets whose type matched the current type index, we simply 

















have to sum over the type index to return the final value of 











the actual target marginals updated for positive detections. 
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H. NEGATIVEBAYESIAN.M FUNCTION 


The NegativeBayesian.m function is the Bayesian update 


for nondetection function. This is the traditional use of 





Bayesian updating as described in the introduction. Tite 





takes all values of target marginals where there was no 
detection and updates them for the failed detection. The 


function returns the updated values of the target marginals. 





The function accomplishes this by looking at every 





value of the target marginals that is less than 1, meaning 





if there was a detection there (thus giving a probability 


spike equal to 1), do not apply negative Bayesian updating. 








If the value of the target marginal is less than 1, the 








function updates this probability to its previous value 
multiplied by the probability of failed detection (1 - 
probability of detection). After updating the probability 
of each target marginal, the function normalizes each value 
by dividing it by the sum total of the new probabilities. 
The result is the new target marginals updated for failed 


detections. 


I. MARGINALSMOVEMENT .M FUNCTION 


The MarginalsMovement.m function takes in the current 
target marginals as well as the speed of each of the targets 


and returns the updated values of the target marginals after 





incorporating possible movement for the current timestep 








based on the Markov movement matrix. 


The function accomplishes this by looping through each 





target and another loop through each arc (i,j) for that 











target. First, it updates every arc to the new value based 





on movement out of it for the next timestep by multiplying 
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by the movement matrix directly. Next, it updates the 





values of nodes that have some probability moving into them 





from adjacent roads. After that, it multiplies the values 





on roads by (I-TURN) probability to lessen the values on 


arcs where the target could possibly turn around. Finally, 
on every arc where it lowered the probability to account for 
targets turning around, it raises the probability on the 


reverse arc by the corresponding amount. 


J. MOVEMENT .M FUNCTION 


The Movement.m is one of two functions to help model 





the target movement for experimentation. It is not actually 
used in the step function, nor during the actual experiment, 


but rather to aid in the generation of random routes for 








targets to travel during experimentation. It is called in 








the TargetMovement.m function to return the next move of a 








target that needs a new destination. It takes in the old 





position of the target and the Markov movement matrix. It 





returns the new destination node of that target. 


This function works by looking at the Markov movement 





matrix in the row of the starting position of the target 





(which will sum to 1, by definition) and making a random 


draw from a uniform(0,1) distribution. With this random 





number, the function returns the column of the number whose 


cumulative probability matches with the random number drawn. 


K. TARGETMOVEMENT .M FUNCTION 


The second of two functions made to model target 





movement for experimentation. It takes in the amount of 





time the targets will move around, the number of targets, a 
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speed array containing the speed of each target, and the 











starting positions of the targets. It returns the final 











positions of the targets after it has moved for the amount 





of time input. The output matrix has one row for each 


target and three columns with the first two representing the 





start and finish nodes of the current arc the target is on 





(if start and finish nodes are equal, the target is 


stationary at that node), and the third being how many 





timesteps the target has remaining on that arc before 





completing it. If the user would like to see every movement 








in the sequence, just repeatedly run the function with end 








time equal to one timestep and update the start positions 


with the output positions from the previous step. 


This function works by entering a “while” loop until 





the simulation time reaches th nd time input. It then 








loops through each target to update their positions one at a 





time. If the current target is stationary at a node, it 





calls the movement function to get a new destination node 


(which could be to remain at the same node for another 





timestep), otherwise the target remains on the road it was 








previously located. It then makes a draw from a 





uniform(0,1) distribution, if this random draw is less than 





the turn probability, the function reverses the arc and 





number of moves remaining to complete that arc, otherwise 


the function only updates the number of moves remaining 





until completion of its current arc. Finally, the function 





stores all of the new information in the output matrix and 


increments time for the next timestep. 
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